Prior Art Search using International Patent Classification Codes and All-Claims-Queries

نویسندگان

  • György Szarvas
  • Benjamin Herbert
  • Iryna Gurevych
چکیده

In this study, we describe our system at the Intellectual Property track of the 2009 CrossLanguage Evaluation Forum campaign (CLEF-IP). The CLEF-IP track addressed prior art search for patent applications. We used the Apache Lucene IR library to conduct experiments with the traditional TF-IDF-based ranking approach, indexing both the textual content of each patent and the IPC codes assigned to each document. We formulated our queries by using all claims and the title of a patent application in order to measure the (weighted) lexical overlap between topics and prior art candidates. We also formulated a language-independent query using the IPC codes of a document to improve the coverage and to obtain a more accurate ranking of candidates. Additionally, we used the IPC taxonomy (the categories and their short descriptive texts) to create a Concept Based Query Expansion [14] model for measuring the semantic overlap between topics and prior art candidates and tried to incorporate this information to our system’s ranking process. Probably due to an insufficient length of definition texts in the IPC taxonomy (used to define the concept mapping of our model), incorporating the concept based similarity measure did not improve our performance and was thus excluded from the final submission. Using the extended boolean vector space model of Lucene, our system remained efficient and still yielded fair performance: it achieved the 6th best Mean Average Precision score out of 14 participating systems on 500 topics, and the 4th best score out of 9 participants on 10.000 topics.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prior Art Search and Its Evaluation

Prior Art Search is an information seeking task where searchers, for instance patent examiners, search for published literature to determine whether the claimed invention in a patent application is novel. In Prior Art Search, search tasks are often timesensitive and consist of rich information needs with multiple aspects/subtopics. In this thesis, we explore information retrieval techniques and...

متن کامل

Query Formulation for Prior Art Search - Georgetown University at CLEF-IP 2013

Our group participated in the CLEF-IP 2013 Passage Retrieval starting from Claims task. We focus on formulating representative queries from various metadata that is embedded in a patent document. We then submit the queries to a state-of-the-art search engine to perform document level retrieval. For passage level retrieval, we implement a TF-IDF algorithm that calculates the sum of query keyword...

متن کامل

Exploring Keyphrase Extraction and IPC Classification Vectors for Prior Art Search

In this paper we describe experiments conducted for CLEFIP 2011 Prior Art Retrieval track. We examined the impact of 1) using key phrase extraction to generate queries from input patent and 2) the use of citation network and (International Patent Classification) IPC class vector in ranking patents. Variations of a popular key phrase extraction technique were explored for extracting and scoring ...

متن کامل

Strategies for Effective Chemical Information Retrieval

We participated in the technology survey and prior art search subtasks of the TREC 2009 Chemical IR Track. This paper describes the methods developed for these two tasks. For the technology survey task, we propose a method that constructs highly structured queries to do retrieval on different fields of chemical patents and documents in a weighted way. The proposed method i) enriches these struc...

متن کامل

Query Enhancement for Patent Prior-Art-Search Based on Keyterm Dependency Relations and Semantic Tags

Prior art search is one of the most common forms of patent search, whose goal is to find patent documents that constitute prior art for a given patent being examined. Current patent search systems are mostly keyword-based, and due to the unique characteristics of patents and their usage, such as embedded structure and the length of patent documents, there are rooms for further improvements. In ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009